Skip to content

[DRAFT] Add support for lambda column capture#21323

Draft
gstvg wants to merge 50 commits intoapache:mainfrom
gstvg:lambda_capture
Draft

[DRAFT] Add support for lambda column capture#21323
gstvg wants to merge 50 commits intoapache:mainfrom
gstvg:lambda_capture

Conversation

@gstvg
Copy link
Copy Markdown
Contributor

@gstvg gstvg commented Apr 2, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate spark labels Apr 2, 2026
} else if let Some(lambda_variable) =
expr.as_any().downcast_ref::<LambdaVariable>()
{
used_column_indices.insert(lambda_variable.index());
Copy link
Copy Markdown
Member

@rluvaton rluvaton Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 98% sure this has a bug for conflicting indices for lambda variable and columns, and even if you separate lambda variable indices from the column indices you can still have problem with nested lambda variables and using upper lambda variable inside nested ones

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sqllogictest test which I hope includes all the cases you cited and more (4932cae). Compared to your snippet at #21231 (comment) where lambda variables are included first in the scoped schema and external columns after them, here lambda variables are pushed to the end of the outer schema, which still includes unreferenced columns, and in case of any name conflicts(a lambda variable shadows a field from the outer schema), we rename the shadowed field to an unique name ( 5c5ca19#diff-a3e127629e9516ec496d656ebb53a1e8bf730eb02d219c4ce42ee47572685844R253-R325, 5c5ca19#diff-7fb0a64e734f54d94d48e9e02c51573a3678205f9ee8e2afaf41d686187a285eR440-R489). That way, after a field has been introduced into the schema, be it a column on the outermost schema or a lambda variable into inner schemas, their index never changes, regardless of how many new scopes are created from it down the tree. Because of that, the casewhen optimization (as well as the same opimization in lambdas) can safely collect all indices and assume all those that are out-of-bounds of the scoped batch it's projecting refer to inner lambda variables not yet available. It still need to rewrite all of them since they were originally computed based on the unprojected, full schema, and any projection of a outer schema affects the indices of all it's derived, inner schemas, and must be propagated down the tree, for every projection(inner projections couldn't know how to rewrite indices of outer projection)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate execution Related to the execution crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate spark sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants